Search CORE

334 research outputs found

Annual report

Author: Intel Corporation
Publication venue: [S.l.] : Intel,
Publication date
Field of study

Diposit Digital de Documents de la UAB

Global citizenship report

Author: Intel Corporation
Publication venue: [S.l.] : Intel,
Publication date
Field of study

Diposit Digital de Documents de la UAB

Disengaged Scheduling for Fair, Protected Access to Fast Computational Accelerators

Author: Dwarakinath A.
GPU
Gupta V.
Intel Corporation
Kato S.
Kato S.
Kyriazis G.
Menychtas K.
Shen K.
Soares L.
Publication venue
Publication date: 04/09/2014
Field of study

Today’s operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4 % overhead on average (max 18%) compared to direct devic

CiteSeerX

Crossref

Efficient Power Gating of SIMD Accelerators Through Dynamic Selective Devectorization in an HW/SW Codesigned Environment

Author: Baron M.
Bira Calin
D'Arcy Paul
Intel Corporation
Pavlou Demos
Sathaye Sumedh
Standard Performance Evaluation Corporation
Tschanz James W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2014
Field of study

Crossref

Edinburgh Research Explorer

Parallel matrix transpose algorithms on distributed memory concurrent computers

Author: Azari
Bokhari
Choi
Choi
Choi
David W. Walker
Dongarra
Eklundh
Golub
Intel Corporation
Jack J. Dongarra
Jaeyoung Choi
Johnsson
Littlefield
O'Leary
Strang
Takkella
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Improving the reliability of commodity operating systems

Author: Brian N. Bershad
Chen P.
Custer H.
Forin A.
Gosling J.
Hand S. M.
Henry M. Levy
Intel Corporation
Levy H. M.
Michael M. Swift
Ng W. T.
Organick E. I.
UDI.
Young M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

Author: A Szalkowski
A Wirawan
A Wozniak
Intel Corporation
ITS Li
JR Miller
M Farrar
O Gotoh
S Henikoff
SF Altschul
SF Altschul
SM Rumble
T Rognes
TF Smith
Torbjørn Rognes
UniProt Consortium
W Rudnicki
Y Liu
Y Liu
Ł Ligowski
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from <url>http://dna.uio.no/swipe/</url> under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Message length effects for solving polynomial systems on a hypercube

Author: Allgower
Billups
Businger
Chen
Chern
Chow
Cloete
Cosnard
Ellis
Gajski
Gentzsch
Heller
Intel Corporation
Kaneda
Kowalik
Kowalik
Kubicek
Lakshmivarahan
Layne T Watson
Morgan
Morgan
Morgan
Parkinson
Reed
Rheinboldt
Rice
Schwandt
Seitz
Shampine
Sips
Watson
Watson
Watson
Watson
White
White
Wolfgang Pelz
Publication venue: 'Elsevier BV'
Publication date: 01/04/1989
Field of study

Polynomial systems of equations frequently arise in solid modelling, robotics, computer vision, chemistry, chemical engineering, and mechanical engineering. Locally convergent iterative methods such as quasi-Newton methods may diverge or fail to find all meaningful solutions of a polynomial system. Recently a homotopy algorithm has been proposed for polynomial systems that is guaranteed globally convergent (always converges from an arbitrary starting point) with probability one, finds all solutions to the polynomial system, and has a large amount of inherent parallelism. For this homotopy algorithm and a given decomposition strategy, the communication overhead for several possible communication stritegies is explored empirically in this paper. The experiments were conducted on an iPSC-32 hypercube.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/27982/1/0000415.pd

Crossref

Deep Blue Documents at the University of Michigan

Practical flow cytometry, second edition, by Howard M. Shapiro. Alan R. Liss, Inc., New York, 1988, 370 pages, $59.50

Author: Fetterhoff
Intel Corporation
Intel Corporation
Murphy
Publication venue: 'Wiley'
Publication date
Field of study

Crossref